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Computational  models  of  learning  and  the  theories  they 
represent  are  often  validated  by  calibrating  them  to  hu¬ 
man  data  on  decision  outcomes.  However,  only  a  few 
models  explain  the  process  by  which  these  decision  out¬ 
comes  are  reached.  We  argue  that  models  of  learning 
should  be  able  to  reflect  the  process  through  which  the 
decision  outcomes  are  reached,  and  validating  a  model  on 
the  process  is  likely  to  help  simultaneously  explain  both 
the  process  as  well  as  the  decision  outcome.  To  demon¬ 
strate  the  proposed  validation,  we  use  a  large  dataset 
from  the  Technion  Prediction  Tournament  and  an  exist¬ 
ing  Instance-based  Learning  Model.  We  present  two  ways 
of  calibrating  the  Model’s  parameters  to  human  data:  on 
an  outcome  measure  and  on  a  process  measure.  In  agree¬ 
ment  with  our  expectations,  we  find  that  calibrating  the 
Model  on  the  process  measure  helps  to  explain  both  the 
process  and  outcome  measures  compared  to  calibrating 
the  Model  on  the  outcome  measure.  These  results  hold 
when  the  Model  is  generalized  to  a  different  dataset.  We 
discuss  implications  for  explaining  the  process  and  the  de¬ 
cision  outcomes  in  computational  models  of  learning. 
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Unlike  disciplines  like  economics,  models  of  decision  mak¬ 
ing  in  psychology  often  incorporate  theories  of  the  un¬ 
derlying  cognitive  processes  that  lead  to  specific  outcomes 
in  a  decision  task.  For  example,  Instance-Based  Learn¬ 
ing  Theory  (IBLT;  Gonzalez  &  Dutt,  2011),  a  theory  of 
how  people  make  dynamic  decisions  commonly  includes 
assumptions  of  how  people  search  for  information  (i.e., 
the  process)  and  how  this  information  search  helps  peo¬ 
ple  to  arrive  at  a  decision  (i.e.,  the  outcome).  However, 
many  of  the  process  theories  and  corresponding  models 
are  just  tested  on  an  outcome  level;  rather  than  on  the 
process  level  itself  (Johnson  et  ah,  2008).  Accounting  for 
both  the  decision  outcomes  and  the  process  through  which 
these  outcomes  are  reached  is  important  in  mathemati¬ 
cal  models  (Scheres  &  Sanfey,  2006).  That  is  because  by 
accounting  for  the  process  and  decision  outcomes  will  en¬ 
able  such  models  to  provide  better  account  of  the  observed 
phenomena.  Furthermore,  it  is  also  important  to  account 
for  process  and  decision  outcomes  in  computational  mod¬ 
els  of  learning  that  try  to  explain  human  decisions  (Buse- 
meyer  &  Diederich,  2009,  Erev  &  Barron,  2005,  Rapoport 


h  Budescu,  1992).  For  example,  researchers  investigat¬ 
ing  choice  behavior  are  often  interested  in  explaining  the 
overall  maximization  behavior  (an  outcome  measure)  and 
the  exploratory  behavior  (e.g.,  alternation  between  alterna¬ 
tives,  a  process  measure)  through  cognitive  models,  which 
explains  how  people  learn  to  maximize  long-term  rewards 
(Biele,  Erev  &;  Ert,  2009;  Erev,  Ert,  Roth,  Haruvy  et  al., 
2010;  Gonzalez  h  Dutt,  2011). 

Amidst  the  importance  of  accounting  for  both  the  de¬ 
cision  outcome  and  the  process,  literature  has  revealed  a 
strong  relationship  between  these  two,  where  the  result¬ 
ing  outcome  is  consistent  with  the  adopted  process  (Erev 
h  Barron,  2005;  Green,  Price  h  Hamburger,  1995;  Hills 
h  Hertwig,  2010).  According  to  Erev  and  Barron  (2005), 
one  expects  a  strong  relationship  between  process  and  de¬ 
cision  outcomes  in  cases  where  the  decision  environment  is 
dynamic  (i.e.,  repeated),  and  where  the  decision  outcome 
is  contingent  upon  the  process.  For  example,  consider  a 
repeated  binary-choice  task,  where  choices  are  made  re¬ 
peatedly  between  two  alternatives.  One  of  the  alternatives 
is  risky  with  a  high  outcome  and  a  low  outcome.  These 
two  outcomes  occur  with  a  certain  pre-defmed  probabili¬ 
ties  when  this  risky  alternative  is  chosen.  The  other  al¬ 
ternative  is  safe  with  a  medium  outcome.  This  medium 
outcome  occurs  with  a  sure  (100%)  chance  when  this  al¬ 
ternative  is  chosen.  Now,  if  the  expected  value  of  the  risky 
alternative  is  greater  than  that  of  the  safe  alternative  (i.e., 
the  safe  alternative  is  maximizing),  then  participants  who 
alternate  a  lot  while  selecting  alternatives  would  end-up 
maximizing  their  choices  only  half  of  the  time.  In  fact, 
Hills  and  Hertwig  (2010)  show  that  people  seem  to  rely 
on  two  distinct  alternation  processes  while  making  binary 
choices;  both  these  processes  achieve  different  amounts  of 
maximization  behavior.  These  arguments  are  not  only  rel¬ 
evant  to  human  decisions  but  also  to  decision  making  in 
animals.  For  example,  Green  et  al.  (1995)  have  shown 
that  pigeons  can  only  learn  to  maximize  their  outcomes  by 
alternating  between  available  alternatives  in  a  probabilistic 
environment  involving  repeated  choices  between  safe  and 
risky  alternatives. 

Calibrating  models  to  both  process  and  outcome  mea¬ 
sures  from  one-time  sequential  sampling  tasks  is  already 
common  in  literature  (Ratcliff,  1978;  Ratcliff  &  Smith, 
2004).  For  example,  Ratcliff  (1978)  calibrated  models  to 
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both  outcome  and  process  measures  in  an  old-new  recog¬ 
nition  memory  task.  In  this  task,  the  outcome  measure 
was  proportion  of  correct  responses  and  the  process  mea¬ 
sure  was  the  accumulation  of  evidence  to  a  threshold  for 
making  a  response.  In  fact,  calibrating  models  to  both 
outcome  and  process  measures  in  one-time  choice  tasks  is 
so  commonthat  a  suite  of  software  called  Diffusion  Model 
Analysis  Toolbox  (DM AT,  Vandekerckhove  k  Tuerlinckx, 
2007)  has  been  recently  developed  for  this  purpose. 

In  contrast,  to  the  authors’  best  knowledge,  except  for 
one  study  (mentioned  below)  no  one  has  explicitly  cali¬ 
brated  models  to  outcome  and  process  measures  simultane¬ 
ously  in  dynamic  decision  making  tasks  (Johnson,  Schulte- 
Mecklenbeck  k  Willemsen,  2008).  Johnson  et  al.  (2008) 
demonstrated  via  computational  modeling  that  the  prior¬ 
ity  heuristic,  which  provides  a  novel  account  of  how  people 
make  risky  choices,  captures  the  decision  outcomes;  yet, 
this  heuristic  fails  to  account  for  the  process  measures.  The 
general  finding  is  that  although  certain  behavioral  results 
reveal  a  strong  connection  between  the  decision  outcome 
and  the  process,  existing  models  of  learning  in  dynamic 
decision  tasks  rarely  show  any  relationship  between  them 
(Dember  k  Fowler,  1958;  Erev  k  Barron,  2005;  Erev,  Ert, 
Roth,  Haruvy  et  ah,  2010;  Rapoport  k  Budescu,  1992; 
Rapoport,  Erev,  Abraham  k  Olson,  1997;  Tolman,  1925). 
For  example,  although  the  outcome  results  (i.e.,  maximiza¬ 
tion)  in  a  symmetrical  zero-sum  matching  pennies  game 
were  consistent  with  predictions  from  a  reinforcement¬ 
learning  algorithm,  process  results  (i.e.,  alternations  be¬ 
tween  alternatives)  could  not  be  accounted  for  by  the  algo¬ 
rithm  (Erev  k  Barron,  2005;  Rapoport  k  Budescu,  1992). 
Similarly,  according  to  Johnson  et  al.  (2008),  the  prior¬ 
ity  heuristic,  a  strategy  to  account  for  risky  choices,  fails 
to  account  for  the  process  measures  in  dynamic  decision 
tasks. 

In  one  study,  Gonzalez  and  Dutt  (2011)  have  calibrated 
cognitive  models  in  the  sampling  paradigm  (a  dynamic 
task),  where  participants  are  asked  to  sample  options  free 
of  cost  before  making  a  consequential  choice  for  real.  Gon¬ 
zalez  and  Dutt  (2011)  demonstrate  that  a  computational 
model  based  upon  the  IBLT  (Gonzalez,  Lerch  k  Lebiere, 
2003),  (“IBL  model”  hereafter),  when  calibrated  on  the 
outcome  measure,  was  able  to  also  explain  the  process  mea¬ 
sure  better  than  the  best  models  known  in  two  different  ex¬ 
perimental  paradigms.  Gonzalez  and  Dutt  (2011)  however, 
did  not  calibratetheir  model  on  the  process  measure  as  well. 
Thus,  it  remains  unclear  what  effect  calibrating  a  model 
to  the  process  measure  compared  to  the  outcome  measure 
has  on  the  model  predictions  of  both  these  measures.  In 
general,  one  expects  the  decision  outcome  to  be  the  result 
of  the  process  (Johnson  et  ah,  2008).  Thus,  calibrating 
models  on  process  measures  rather  than  outcome  measures 
should  have  benefits  in  explaining  both  these  measures  at 
the  same  time. 

Although  it  is  hard  to  find  models  calibrated  to  out¬ 
come  and  process  measures  in  dynamic  tasks,  past  studies 
have  made  certain  qualitative  predictions  of  dynamic  de¬ 
cision  models  (Busemeyer,  1985;  Hertwig,  Barron,  Weber 
k  Erev,  2004;  Lee,  Zhang,  Munro  k  Steyvers,  2011)  on 
outcome  and  process  measures.  However,  a  quantitative 
empirical  investigation  of  these  models  on  both  these  mea¬ 
sures  is  something  currently  lacking  and  much  needed  in 
literature.  This  paper  makes  a  contribution  in  this  area  by 
investigating  the  benefit  of  calibrating  cognitive  models  to 
outcome  and  process  data  in  a  dynamic  decision  task. 

In  this  paper,  we  evaluate  the  role  of  calibrating  a  com¬ 


putational  model  to  either  the  decision  outcome  or  the 
process  in  explaining  and  predicting  both  these  measures. 
Specifically,  we  calibrate  an  IBL  model  (Gonzalez  k  Dutt, 
2011),  to  a  risk-taking  measure  (decision  outcome)  or  an  al¬ 
ternation  measure  (process) ,  and  evaluate  the  model  fits  to 
human  data  (through  parameter  calibration  in  a  dataset) 
and  predictions  (through  generalization  in  a  dataset  differ¬ 
ent  from  calibration).  Given  the  hypothesized  benefits  of 
calibrating  models  on  process  measures  (Camerer  k  Ho, 
1999;  Suppes  k  Atkinson  1959),  we  expect  that  the  IBL 
model  being  calibrated  to  the  alternation  measure  would 
improve  its  explanation  about  both  the  risk-taking  and  al¬ 
ternations  compared  to  when  it  is  calibrated  on  the  risk¬ 
taking  measure.  We  use  two  large  human  datasets,  esti¬ 
mation  and  competition,  that  were  collected  for  the  2008 
Technion  Prediction  Tournament  (TPT  (Erev,  Ert,  Roth, 
Haruvy  et  ah,  2010).  The  choice  of  TPT  datasets  is  be¬ 
cause  the  main  focus  of  the  tournament  was  on  outcome 
measures,  where  no  attention  was  given  to  process  mea¬ 
sures  (Erev,  Ert,  Roth,  Haruvy  et  al.,  2010).  That  is  be¬ 
cause  it  was  felt  that  paying  less  attention  to  the  process 
measures  can  actually  help  the  prediction  of  the  outcome 
measures  (Erev  k  Haruvy,  2005;  Estes,  1962),  which  is 
contrary  to  the  hypothesis  under  test  in  this  paper.  Thus, 
this  dataset  becomes  an  ideal  choice  for  testing  a  process- 
measure  calibrated  model’s  ability  to  perform  on  the  out¬ 
come  measure.  In  what  follows,  we  first  discuss  the  role 
of  the  calibration  process  in  computational  models.  Next, 
we  present  the  effects  of  calibrating  an  existing  IBL  model 
on  the  outcome  measure  or  the  process  measure  on  the  ex¬ 
planations  and  predictions  of  one  or  both  measures  in  the 
TPT’s  datasets.  We  close  this  paper  by  discussing  the  role 
of  model  calibration  to  account  for  both  the  process  and 
decision  outcomes. 

The  Role  of  Model  Calibration  in  Explaining  Different 
Measures  of  Performance  Calibrating  a  model  to  human 
data  means  finding  the  values  of  its  parameters  that  mini¬ 
mize  the  deviation  between  model’s  predictions  and  obser¬ 
vations  on  a  dependent  measure.  In  the  TPT,  several  influ¬ 
ential  models1  of  learning  in  binary  choice  were  calibrated 
and  evaluated  on  only  the  outcome  measure  (risk-taking) 
and  not  on  the  process  measure  (alternations) .  These  mod¬ 
els  were  able  to  account  for  risk-taking  very  well;  however, 
many  of  them  did  not  provide  any  way  of  computing  the 
alternations  (Gonzalez  k  Dutt,  2011).  In  fact,  most  of  the 
competing  models  did  not  provide  any  way  to  explain  the 
learning  process  (see  an  extended  discussion  about  these 
models  in  Gonzalez  and  Dutt  (2011)).  For  example,  a 
number  of  models  submitted  to  the  TPT  used  prospect  the¬ 
ory  (Tversky  k  Kahneman,  1992),  to  predict  choices  based 
upon  calibrated  mathematical  functions.  Prospect  Theory 
does  not  provide  any  mechanism  that  would  predict  the 
sequential  selection  of  options  over  time.  In  fact,  only  a 
few  recent  models  of  repeated  binary-choice  may  account 
for  both  the  risk-taking  and  alternation  measures  simulta¬ 
neously:  One  of  these  models  is  the  Inertia  Sampling  and 
Weighting  (I-SAW)  model  (Chen  et  ah,  2011;  Nevo  k  Erev, 
2012;  Erev,  Ert,  Roth,  Haruvy  et  al.,  2010)  and  the  other  is 
an  IBL  model  (Gonzalez  k  Dutt,  2011;  Gonzalez,  Dutt  k 
Lejarraga,  2011;  Lejarraga,  Dutt  k  Gonzalez,  2012).  How¬ 
ever,  these  models  were  calibrated  on  both  the  outcome 
and  process  measures  at  the  same  time,  which  makes  it 

^ome  of  these  models  included  the  two-stage  sampler  model, 
the  normalized  reinforcement  learning  with  inertia  model,  and 
the  explorative  sampler  with  recency  model  (Erev,  Ert,  Roth, 
Haruvy  et  al.,  2010) 
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difficult  to  evaluate  the  utility  of  calibrating  models  to  one 
of  these  measures. 

We  expect  that  calibrating  a  model  to  the  process  mea¬ 
sure  should  generally  be  beneficial  for  the  model’s  ability 
to  explain  both  the  process  and  outcome  measures  upon 
generalization  to  novel  conditions.  Next,  we  provide  de¬ 
tails  about  the  TPT  datasets  that  we  use  to  evaluate  the 
IBL  model. 


Method 

Risk-taking  and  Alternations  in  Technion 
Prediction  Tournament 

Competing  models  submitted  to  the  TPT  were  evalu¬ 
ated  according  to  the  generalization  criterion  method 
(Busemeyer  &  Wang,  2000),  by  which  models  were 
calibrated  on  choices  made  by  participants  in  60  prob¬ 
lems  (the  estimation  set)  and  later  tested  in  a  new  set 
of  60  problems  (the  competition  set)  with  the  param¬ 
eters  obtained  from  the  calibration  process  in  the  es¬ 
timation  set.  The  generalization  criterion  method  was 
believed  to  be  a  true  test  for  models  to  explain  ob¬ 
served  choice  decisions.  Although  the  TPT  involved 
three  different  experimental  paradigms,  we  only  use 
data  from  the  “E-repeated”  paradigm  that  involved 
consequential  choices  in  a  repeated  binary-choice  task 
with  immediate  outcome  feedback  on  the  chosen  alter¬ 
native.  For  each  of  the  60  problems  in  the  estimation 
and  competition  sets  in  this  paradigm,  a  sample  of 
100  participants  was  randomly  assigned  into  5  groups 
of  20  participants  each,  and  each  group  completed  12 
of  the  60  problems.  Each  participant  was  instructed 
to  repeatedly  and  consequentially  select  between  two 
unlabeled  buttons  on  a  computer  screen  in  order  to 
maximize  long-term  rewards  for  a  block  of  100  trials 
per  problem  (this  end  point  was  not  known  to  partic¬ 
ipants).  One  button  was  associated  with  a  risky  al¬ 
ternative  and  the  other  button  with  a  safe  alternative. 
Selecting  an  alternative,  safe  or  risky,  generated  an 
outcome  for  the  selected  alternative  (thus,  the  foregone 
outcome  on  the  unselected  alternative  was  not  shown) . 
The  selection  of  the  alternative  with  the  higher  ex¬ 
pected  value,  which  could  be  either  the  safe  or  risky 
button,  would  maximize  a  participant’s  long-term  re¬ 
wards.  Therefore,  choosing  a  maximizing  alternative 
across  all  the  repeated  trials  would  constitute  the  op¬ 
timal  strategy  in  the  task.  Other  details  about  the 
E-repeated  paradigm  are  reported  in  Erev,  Ert,  Roth, 
Haruvy  et  al.  (2010). 

The  models  submitted  to  the  TPT  were  not  pro¬ 
vided  with  human  data  for  alternation  between  options 
(i.e.,  the  A-rate  or  the  process  measure),  and  they  were 
evaluated  only  according  to  their  ability  to  account  for 
the  risk-taking  behavior  (i.e.,  the  R-rate  or  the  out¬ 
come  measure)  (Erev,  Ert,  Roth,  Haruvy  et  al.,  2010). 
We  calculated  the  A-rate  for  analyses  of  alternations 
from  the  TPT  data  (see  results  in  Gonzalez  and  Dutt, 
2011).  First,  alternations  are  either  coded  as  Is,  the 
respondent  switched  from  making  a  risky  or  safe  choice 
in  the  last  trial  to  making  a  safe  or  risky  choice  in  the 


current  trial;  or  they  are  coded  as  0s,  the  respondent 
simply  repeated  the  last  trial’s  choice.  The  proportion 
of  alternations  in  each  trial  is  computed  by  averaging 
the  alternations  over  20  participants  per  problem  and 
the  60  problems  in  each  dataset.  The  R-rate  is  the 
proportion  of  risky  choices  in  each  trial  averaged  over 
20  participants  per  problem  and  the  60  problems  in 
each  dataset.  A  problem  is  defined  as  consisting  of 
two  alternatives,  risky  and  safe.  In  the  risky  alterna¬ 
tive,  there  are  two  possible  outcomes,  high  and  low, 
where  the  occurrence  of  these  outcomes  is  determined 
by  corresponding  probability  value.  In  the  safe  alter¬ 
native,  there  is  one  possible  outcome,  medium,  where 
this  outcomes  occurs  with  a  100%  chance.  For  cal¬ 
culating  the  A-rate  and  R-rate,  the  averaging  is  done 
over  20  participants  as  this  many  participants  were 
collected  in  the  TPT  (Erev,  Ert,  Roth,  Haruvy  et  al., 
2010). 

Figure  1  shows  the  overall  R-rate  and  A-rate  over 
99  trials  from  trial  2  to  trial  100  in  the  estimation  and 
competition  sets.  As  seen  in  both  of  these  datasets, 
the  R-rate  is  relatively  constant  across  trials,  in  con¬ 
trast  to  the  sharp  decrease  in  the  A-rate.  The  sharp 
decrease  in  the  A-rate  shows  a  transition  in  the  pattern 
of  information-search  across  trials  (Gonzalez  &  Dutt, 
2011).  Overall,  these  R-rate  and  A-rate  curves  sug¬ 
gest  that  risk-taking  remains  relatively  steady  across 
trials,  while  they  learn  to  alternate  less  and  choose  one 
of  the  two  alternatives  more  often.  Thus,  the  A-rate 
(process)  is  more  dynamic  compared  to  the  R-rate  (de¬ 
cision  outcome)  and  due  to  these  differences  it  is  likely 
to  be  harder  for  a  model  to  account  for  the  A-rate 
compared  to  the  R-rate.  We  use  the  R-rate  and  A- 
rate  curves  in  Figure  1  to  evaluate  the  role  of  model 
calibration  ahead  in  this  paper. 

An  Instance-based  Learning  Model  of  Repeated 
Binary-choice 

IBLT  (Gonzalez  et  al.,  2003)  has  been  used  as  the  basis 
for  developing  computational  models  that  capture  hu¬ 
man  behavior  in  a  wide  variety  of  dynamic  decision 
making  tasks.  These  include  dynamically-complex 
tasks  like  the  water  purification  plant  task  (Gonzalez 
&  Lebiere,  2005;  Gonzalez  et  al.,  2003;  Martin,  Gonza¬ 
lez  &  Lebiere,  2004),  training  paradigms  of  simple  and 
complextasks  (Gonzalez,  Best,  Healy,  Bourne  &  Kole, 
2010),  simple  stimulus-response  practice  and  skill  ac¬ 
quisition  tasks  (Dutt,  Yamaguchi,  Gonzalez  &  Proc¬ 
tor,  2009)  and  repeated  binary-choice  tasks  (Gonzalez 
&  Dutt,  2011;  Gonzalez  et  al.,  2011;  Lebiere,  Gonzalez 
&  Martin,  2007;  Lejarraga  et  al.,  2012)  among  others. 
The  different  computational  applications  of  IBLT  il¬ 
lustrate  its  generality  and  ability  to  capture  decisions 
from  experience  in  multiple  contexts. 

A  recent  IBL  model  has  showcased  the  theory’s  ro¬ 
bustness  across  multiple  choice  tasks:  A  probability¬ 
learning  task,  a  repeated  binary-choice  task  with  fixed 
probabilities,  and  a  repeated  binary-choice  task  with 
changing  probabilities  (Lejarraga  et  al.,  2012).  We 
use  this  model  to  evaluate  the  effects  of  model  cali- 
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Figure  1.  (A)  The  R-rate  and  A-rate  across  trials  observed  in  human  data  in  the  estimation  set  of  the  TPT  between  trial  2  and  trial 
100.  (B)  The  R-rate  and  A-rate  across  trials  observed  in  human  data  in  the  competition  set  of  the  TPT  between  trial  2  and  trial  100. 


bration  to  different  outcome  or  process  measures.  The 
model’s  formulations  and  decision-making  process  are 
further  explained  in  other  publications  (Gonzalez  & 
Dutt,  2011;  Lejarraga  et  ah,  2012)  and  summarized 
in  the  Appendix.  This  model  makes  choice  selec¬ 
tions  between  alternatives  in  a  trial  by  comparing  the 
weighted  averages  of  observed  outcomes  on  each  alter¬ 
native  called  “blended  values.”  A  blended  value  for  an 
alternative,  safe  or  risky,  is  a  function  of  the  probabil¬ 
ity  of  retrieving  instances  from  memory  multiplied  by 
their  respective  outcomes  that  have  been  observed  on 
previous  selections  of  the  alternative  (Lebiere,  1999; 
Lejarraga  et  ah,  2012).  Each  instance  consists  of  a 
label  that  identifies  a  decision  alternative  in  the  task 
and  the  outcome  obtained.  For  example,  (risky,  $32) 
is  an  instance  where  the  decision  was  to  choose  the 
risky  alternative  and  the  outcome  obtained  was  $32. 
The  probability  of  retrieving  an  instance  from  mem¬ 
ory,  which  is  used  to  compute  the  blended  value,  is  a 
function  of  its  activation  (Anderson  &  Lebiere,  1998). 
Each  observed  outcome  (represented  by  a  correspond¬ 
ing  instance  in  memory)  has  an  activation  value  that 
is  a  function  of  the  recency  and  frequency  of  observing 
the  outcome  plus  a  noise  term.  This  simplified  activa¬ 
tion  equation  has  shown  to  be  sufficient  at  explaining 
human  choices  in  several  experiential  tasks  (Gonzalez 
&  Dutt,  2011;  Lejarraga  et  ah,  2012).  The  activation 
is  influenced  by  the  decay  parameter  ,  which  captures 
the  rate  of  forgetting  or  the  reliance  on  recency  and  fre¬ 
quency  of  observing  outcomes.  The  higher  the  value 
of  the  parameter,  the  greater  is  the  model’s  reliance 
on  outcomes  experienced  recently.  The  activation  is 
also  influenced  by  a  noise  parameter  that  is  important 
for  capturing  the  variability  in  human  behavior  from 
one  participant  to  another.  IBL  borrows  and  s  param¬ 
eters  and  the  activation  equation  from  a  popular  cog¬ 
nitive  framework  called  ACT-R  (Atomic  Components 
of  Thought  -  Rational;  Anderson  &  Lebiere,  1998). 
However,  unlike  ACT-R  where  and  s  parameters  are 
kept  fixed,  we  calibrate  the  values  of  these  parameters 
in  the  IBL  model  to  account  for  choices  in  human  data. 
The  model  equations  for  blending  and  activation  are 
included  in  the  Appendix. 


Results 

Model  Calibration  to  Different  Measures 

We  used  a  genetic  algorithm  program  to  calibrate  the 
model’s  parameters  to  minimize  the  mean  squared  de¬ 
viation  (MSD)  between  its  predictions  and  the  ob¬ 
served  average  A-rate  per  problem  or  the  average  R- 
rate  per  problem.  The  average  R-rate  per  problem 
and  the  average  A-rate  per  problem  were  computed 
by  averaging  the  risky  choices  and  alternations  in  each 
problem  over  20  participants  per  problem  and  100  tri¬ 
als  per  problem  (for  a  problem’s  definition,  please  see 
the  description  above).  Later,  the  MSDs  were  calcu¬ 
lated  across  the  60  estimation  set  problems  by  using 
the  average  R-rate  per  problem  and  by  the  average 
A-rate  per  problem  from  the  model  and  human  data. 
For  calibration,  both  the  s  and  the  d  parameters  were 
varied  between  0.0  and  10.0  and  the  genetic  algorithm 
was  run  for  500  generations  (crossover  rate  =  50%; 
mutation  rate  =  10%).  The  assumed  range  of  vari¬ 
ation  for  the  s  and  d  parameters  and  the  number  of 
generations  in  the  genetic  algorithm  is  large,  and  it  en¬ 
sures  that  the  optimization  process  does  not  miss  the 
minimum  MSD  value  due  to  a  small  range  of  parame¬ 
ter  variation  (for  more  details  about  genetic  algorithm 
optimization,  please  see  Gonzalez  &  Dutt,  2011).  We 
calibrated  the  IBL  model  separately  on  the  R-rate  and 
the  A-rate  measures,  and  the  optimized  values  of  the 
d  and  s  parameters  were  determined  for  each  calibra¬ 
tion. 

The  model  calibrated  on  the  R-rate  produced  the 
smallest  MSD  for  d  =  5.00  and  s  =  1.50.  These  pa¬ 
rameters  have  the  same  optimal  values  as  reported  by 
Lejarraga  et  al.  (2012),  who  had  also  calibrated  this 
IBL  model  on  the  R-rate  measure  on  the  same  dataset. 
As  documented  by  Lejarraga  et  al.  (2012),  the  value 
of  both  the  d  and  s  parameters  is  high  compared  to 
the  ACT-R  default  values  of  d  =  0.5  and  s  =  0.25 
(Anderson  &  Lebiere,  1998).  Furthermore,  the  model 
calibrated  on  the  A-rate  produced  the  smallest  MSD 
for  d  =  9.74  and  s  =  0.96.  Thus,  calibrating  the  model 
on  the  A-rate  produces  a  greater  value  for  the  d  pa¬ 
rameter  and  a  slightly  smaller  value  for  the  s  param¬ 
eter.  The  greater  d  parameter  value  suggests  a  high 
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dependency  on  recently  experienced  outcomes  to  make  Generalizing  the  Calibrated  IBL  model  to  the 
choice  decisions.  Competition  set 


Figure  2  shows  the  MSDs  for  the  R-rate  and  the  A- 
rate  from  the  IBL  model  that  was  calibrated  on  the  R- 
rate  or  the  A-rate  in  the  estimation  set.  When  model 
parameters  were  calibrated  on  the  R-rate  (i.e.,  d  =  5.0 
and  s  =  1.5),  the  model  explained  the  R-rate  quite 
well  (MSD  =  0.008),  but  it  explained  the  A-rate  less 
well  (MSD  =  0.063).  Thus,  the  model  explains  the 
outcome  measure  well  when  calibrated  on  the  outcome 
measure;  but,  it  explains  the  process  measure  less  well. 
In  contrast,  when  the  IBL  model  parameters  are  cal¬ 
ibrated  on  the  A-rate,  the  model  explains  the  A-rate 
much  better  (MSD=0.002)  and  the  resulting  R-rate 
also  relatively  well  (MSD  =  0.023).  Thus,  the  bene¬ 
fit  of  calibrating  the  model  on  the  A-rate  measure  (= 
0.061)  is  larger  than  the  detriment  of  calibrating  the 
model  on  the  R-rate  measure  (=  0.015).  Overall,  these 
results  show  that  by  calibrating  the  IBL  model  to  the 
process  measure,  one  is  able  to  explain  both  the  pro¬ 
cess  and  outcome  measures  better  than  by  calibrating 
the  IBL  model  to  the  outcome  measure.  Thus,  these 
results  suggest  that  the  components  of  the  IBL  model 
are  good  representations  of  the  A-rate  process  and  well 
as  the  R-rate  decision  outcomes,  especially  when  ac¬ 
counting  for  the  A-rate  is  more  challenging  than  the 
R-rate  because  the  A-rate  is  more  dynamic  than  the 
R-rate  (Gonzalez  &  Dutt,  2011). 
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Figure  2.  The  MSD  for  the  R-rate  per  problem  and  A-rate  per 
problem  in  the  estimation  set  of  the  TPT.  The  model  was  either 
calibrated  on  the  R-rate  per  problem  or  calibrated  on  the  A-rate 
per  problem  in  the  estimation  set.  The  calibrated  values  of  the  d 
and  s  parameters  obtained  for  each  measure( R-rate  or  A-rate  per 
problem)  have  been  shown  in  brackets. The  differences  for  calibrat¬ 
ing  with  A-rate  measure( respective  R-rate  measure)  are  shown  by 
two  vertical  arrows. 
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Figure  3  presents  the  human  and  model  R-rate  and 
A-rate  across  trials  when  the  model  was  calibrated  to 
the  R-rate  (Figure  3 A)  and  when  it  was  calibrated  to 
the  A-rate  (Figure  3B).  Here,  it  can  be  observed  how 
the  model  explains  the  human  learning  data  better  for 
the  measure  used  to  calibrate  the  model. 


The  demonstration  that  calibrating  a  model  to  a  pro¬ 
cess  measure  helps  explain  both  the  process  and  out¬ 
come  measures  is  an  important  way  to  corroborate  the 
consistency  of  predictions  from  cognitive  models.  A 
robust  model  should  be  able  to  explain  the  learning 
process,  as  well  as  the  outcomes  resulting  from  that 
very  process. 

According  to  Lebiere,  Gonzalez,  and  Warwick 
(2009),  models  that  explain  only  the  outcome  and  not 
the  process  behavior  might  find  it  difficult  to  general¬ 
ize  their  predictions  to  novel  conditions.  Here,  we  used 
the  generalization  criterion  test  (Ahn,  Busemeyer,  Wa- 
genmakers,  &  Stout,  2009;  Busemeyer  &  Wang,  2000), 
to  investigate  the  predictions  that  the  different  calibra¬ 
tion  procedures  can  make  in  novel  data  sets:  We  ran 
the  calibrated  models  in  novel  conditions  to  evaluate 
and  compare  performance.  The  model  calibrated  to 
the  TPT’s  estimation  set  on  the  R-rate  or  the  A-rate 
was  generalized  to  TPT’s  competition  set  by  keeping 
the  same  parameter  values  that  were  derived  during 
calibration.  The  model  was  run  using  20  participants 
per  problems  and  60  problems  in  the  competition  set. 
There  were  different  sets  of  problems  used  between  the 
estimation  and  competition  sets.  Also,  these  problems 
were  run  as  part  of  two  separate  experiments  involving 
different  human  participants.  Given  these  differences, 
one  expects  poorer  performance  from  both  the  mod¬ 
els  in  the  competition  set  compared  to  the  estimation 
set.  However,  as  the  algorithm  used  to  generate  prob¬ 
lems  in  the  competition  set  was  same  as  that  used 
to  generate  problems  in  the  estimation  set,  one  also 
expects  both  models  to  showcase  results  that  are  sim¬ 
ilar  to  those  found  for  the  estimation  set:  The  model 
calibrated  to  the  process  measure  is  able  to  explain 
both  the  process  and  outcome  measures  better  than 
the  model  calibrated  to  the  outcome  measure. 

Figure  4  shows  the  resulting  MSDs  from  generaliz¬ 
ing  the  IBL  model  to  the  competition  set.  The  model 
that  was  calibrated  on  the  estimation  set’s  R-rate  re¬ 
sulted  in  the  best  predictions  for  the  same  measure  in 
the  competition  set  (MSD  =  0.006);  however,  its  pre¬ 
dictions  for  the  A-rate  were  relatively  inferior  (MSD  = 
0.074).  Furthermore,  the  model  that  was  calibrated  on 
the  A-rate  resulted  in  the  best  predictions  for  the  same 
measure  in  the  competition  set  (MSD  =  0.006)  with 
reasonably  good  predictions  for  the  R-rate  (MSD  = 
0.032).  Thus,  again  the  improvement  in  MSD  for  the 
A-rate  is  larger  than  (=  0.068)  the  decrement  in  the 
MSD  for  the  R-rate  (=  0.026).  Also  note  that  the  re¬ 
sults  in  competition  set  (Figure  4)  generate  poorer  per¬ 
formance  (higher  MSDs)  from  the  models,  in  general, 
compared  to  those  in  the  estimation  set  (Figure  2). 

As  in  the  estimation  set,  these  results  translate  to 
the  process  of  learning  over  trials  (see  Figure  5).  The 
model’s  predictions  for  the  measure  on  which  it  was 
calibrated  to  in  the  estimation  set  are  the  best.  The 
model  that  was  calibrated  on  the  R-rate  in  the  estima¬ 
tion  set  predicted  the  R-rate  better  than  the  A-rate 
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Figure  3.  The  R-rate  and  A-rate  across  trials  predicted  by  the  IBL  model  and  that  observed  in  human  data  in  the  TPT’s  estimation  set. 
Panels  A  and  B  show  the  results  of  calibrating  the  IBL  model  to  the  R-rate  per  problem  and  the  A-rate  per  problem,  respectively. 


(Figure  5A);  however,  the  model  that  was  calibrated 
on  the  A-rate  in  the  estimation  set  predicted  both  the 
R-rate  and  A-rate  over  time  quite  well  (Figure  5B). 
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Figure  4.  The  MSD  for  the  R-rate  per  problem  and  A-rate  per 
problem  in  the  competition  set  of  the  TPT.  The  model  was  either 
calibrated  on  the  R-rate  per  problem  or  calibrated  on  the  A-rate 
per  problem  in  the  estimation  set.  The  calibrated  values  of  the  d 
and  s  parameters  obtained  for  each  measure  (R-rate  or  A-rate  per 
problem)in  the  estimation  set  have  been  shown  in  brackets.  The 
differences  for  calibrating  with  A-rate  measure  (respective  R-rate 
measure)  are  shown  by  two  vertical  arrows. 


Discussion 

We  argue  that  strong  and  robust  models  of  human  be¬ 
havior  need  to  explain  both  the  decision  outcome  and 
the  process  from  which  that  outcome  came  about.  We 
suggest  that  many  models  of  human  behavior,  particu¬ 
larly  in  the  context  of  repeated  choice  and  dynamic  de¬ 
cisions  from  experience,  have  only  focused  on  predict¬ 
ing  outcomes  but  not  the  process.  Furthermore,  most 
of  the  existing  computational  models  of  experiential 
decisions  explain  the  decision  outcomes,  while  com¬ 
pletely  ignoring  or  failing  to  account  for  the  process 
through  which  these  decision  outcomes  are  reached 
(see  a  review  of  models  in  (Gonzalez  &  Dutt,  2011). 
This  observation  is  perhaps  not  a  coincidence,  because 
predicting  outcome  as  a  result  of  a  process  is  very  chal¬ 
lenging  (Erev  &  Barron,  2005;  Rapoport  et  ah,  1997). 

Our  findings  presented  the  robustness  of  explaining 
and  predicting  outcome  and  process  measures  through 
an  IBL  model.  We  demonstrated  a  method  for  find¬ 
ing  out  a  cognitive  model’s  ability  in  explaining  both 
the  process  and  the  decision  outcomes.  The  model’s 
calibration  on  the  process  measure  reduced  the  MSD 
for  the  A-rate  (process)  by  a  large  amount  without  a 
large  deterioration  in  the  MSD  for  the  R-rate  (decision 
outcome).  The  proposed  calibration  was  also  helpful 
in  accounting  for  both  these  measures  after  the  model 
was  generalized  into  a  novel  condition. 

Explaining  both  the  process  and  decision  outcomes 
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Figure  5.  The  generalization  of  the  IBL  model  in  the  TPT’s  competition  set.  (A)  The  model’s  parameters  were  calibrated  on  the  R-rate 
per  problem  measure  in  the  TPT’s  estimation  set.  (B)  The  model’s  parameters  were  calibrated  on  the  A-rate  per  problem  measure  in  the 
TPT’s  estimation  set. 


is  important,  because  doing  so  will  improve  our  un¬ 
derstanding  of  how  people  maximize  long-term  goals 
through  the  process  of  sequential  choices  from  expe¬ 
rience.  Several  recent  model-comparison  competitions 
have  suggested  the  use  of  different  dependent  measures 
for  calibrating  models  without  a  clear  motivation  for 
choosing  one  measure  over  the  other.  For  example,  the 
measure  of  model  evaluation  in  the  TPT  was  solely 
risk-taking,  i.e.,  decision  outcomes  (Erev  &  Barron 
2005);  however,  the  measure  of  evaluation  in  the  re¬ 
cently  concluded  market-entry  competition  (Erev,  Ert, 
&  Roth,  2010)  was  a  combination  of  risk-taking  (out¬ 
come)  and  alternations  (process).  Our  analysis  sug¬ 
gests  that  stronger  and  more  robust  models  of  learning 
should  be  able  to  explain  both  the  decision  outcomes 
and  the  process  by  which  these  outcomes  came  about. 
Future  model  comparison  efforts  should  enforce  both 
types  of  measures. 

In  this  paper,  we  used  one  IBL  model  to  showcase 
the  benefits  of  calibrating  models  on  a  process  mea¬ 
sure  compared  to  an  outcome  measure.  This  attempt 
maybe  limited  in  its  ability  at  present  as  we  only  used 
one  model,  IBL,  on  two  datasets.  However,  this  at¬ 
tempt  does  showcase  the  wider  generalizability  of  the 
theory,  IBLT,  which  has  been  used  in  literature  to 
derive  a  number  of  models  on  a  number  of  decision 
tasks  (please  see:  Gonzalez,  in  press;  Gonzalez,  2013 
for  more  arguments). 

As  part  of  our  future  research,  we  would  like  to  build 
on  our  current  finding  by  calibrating  and  evaluating 
models  on  both  the  outcome  and  process  measures  in 


various  tasks  that  differ  in  their  outcome  feedback  and 
dynamics.  Also,  as  part  of  future  research,  we  would 
like  to  consider  the  mutual  benefits  of  calibrating  mod¬ 
els  to  both  process  and  decision  outcomes  especially 
when  there  are  more  than  two  measures.  It  would  be 
interesting  to  observe  the  extent  to  which  the  bene¬ 
fits  of  calibrating  models  to  different  kinds  of  process 
measures  carries  over  to  different  kinds  of  decision  out¬ 
comes.  In  the  case  there  are  more  than  two  measures, 
one  could  combine  multiple  process  and  outcome  mea¬ 
sures  by  doing  a  weighted  sum  of  mean-squared  devi¬ 
ations  calculated  on  these  measures.  One  could  keep 
weights  at  values  such  that  all  combining  measures  are 
weighted  equally  during  optimization.  Furthermore,  it 
would  be  interesting  to  observe  how  calibrating  mod¬ 
els  to  the  process  measures  carries  over  to  the  outcome 
measures  when  the  calibration  is  done  at  the  individual 
level  rather  than  at  the  aggregate  level.  These  eval¬ 
uations  would  help  extend  our  existing  knowledge  on 
this  topic  and  help  us  explore  benefits  and  limitations 
for  computational  models  in  explaining  both  the  de¬ 
cision  outcomes  and  the  process  through  which  these 
outcomes  are  reached. 
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Appendix 

Decision  Rule 

A  choice  is  made  in  the  model  in  trial  t+1  as  the  selec¬ 
tion  of  an  alternative  with  the  highest  blended  value 
as  per  Equation  1  (below). 

Blending  and  activation  mechanisms 

The  blended  value  of  alternative  j  is  defined  as 

n 

Vj  =  Y PiXi  (1) 

i= 1 

Where  X{  is  the  value  of  the  observed  outcome  in  the 
outcome  slot  of  an  instance  i  corresponding  to  the  al¬ 
ternative  j,  and  pi  is  the  probability  of  that  instance’s 
retrieval  from  memory  (for  the  case  of  our  binary- 
choice  task  in  the  experience  condition,  the  value  of 
j  in  Equation  1  could  be  either  Risky  or  Safe).  The 
blended  value  of  an  alternative  is  the  sum  of  all  ob¬ 
served  outcomes  Xi  in  the  outcome  slot  of  correspond¬ 
ing  instances,  weighted  by  the  instances’  probability 
of  retrieval. 

Probability  of  Retrieving  Instances 

In  any  trial  £,the  probability  of  retrieving  instance  i 
from  memory  is  a  function  of  that  instance’s  activa¬ 
tion  relative  to  the  activation  of  all  other  instances 
corresponding  to  thatalternative,  given  by 

Ai,t 

Pi,t  =  ^4-7  (2) 

Where  i r  is  random  noise  defined  as  s  x  y/2  s  and  is  a 
free  noise  parameter.  The  noise  parameter  s  captures 
the  imprecision  of  retrieving  instances  from  memory. 

Activation  of  Instances 

The  activation  of  each  instance  in  memory  depends 
upon  the  activation  mechanism  originally  proposed  in 
ACT-R  [2].  According  to  this  mechanism,  for  each 
trial  £,  activation  Aij  of  instance  i  is: 

Ai,t  =  ln(  Y  {t~U)~d)  +  sx  ln(- — — )  (3) 

uehC,t-i  Vi 

Where  d  is  a  free  decay  parameter,  and  A  is  a  pre¬ 
vious  trial  when  the  instance  i  was  created  or  its  acti¬ 
vation  was  reinforced  due  to  an  outcome  observed  in 
the  task  (the  instance  i  is  the  one  that  has  the  ob¬ 
served  outcome  as  the  value  in  its  outcome  slot).  The 
summation  will  include  a  number  of  terms  that  coin¬ 
cides  with  the  number  of  times  an  outcome  has  been 
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observed  in  previous  trials  and  the  corresponding  in¬ 
stance  V s  activation  that  has  been  reinforced  in  mem¬ 
ory  (by  encoding  a  timestamp  of  the  trial  ti).  There¬ 
fore,  the  activation  of  an  instance  corresponding  to  an 
observed  outcome  increases  with  the  frequency  of  ob¬ 
servation  and  with  the  recency  of  those  observations. 
The  decay  parameter  d  affects  the  activation  of  an  in¬ 
stance  directly,  as  it  captures  the  rate  of  forgetting  or 
reliance  on  recency. 

Noise  in  Activation 

The  yij  term  is  a  random  draw  from  a  uniform  distri¬ 
bution  U( 0,  1),  and  the  s  x  In  ( 1~^,t )  term  represents 
Gaussian  noise  important  for  capturing  the  variability 
of  human  behavior. 

Pre-populated  Instances  in  Memory 

For  the  first  trial, the  IBL  model  does  not  haveany  in¬ 
stances  in  memory  from  which  to  calculate  blended 
values.  Therefore,  the  model  is  made  to  make  a  selec¬ 
tion  between  instances  that  are  pre-populated  in  mem¬ 
ory.  Lejarraga,  Dutt,  and  Gonzalez  [23]  used  a  value 
of  +30  in  the  outcome  slot  of  the  two  alternatives’  in¬ 
stances.  The  +30  value  is  arbitrary,  but  most  impor¬ 
tantly,  it  is  greater  than  any  possible  outcomes  in  the 
TPT  problems  and  will  trigger  an  initial  exploration 
of  the  two  alternatives.  We  use  these  pre-populated 
values  in  the  model  in  this  paper. 
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